The ISL Baseline Lecture Transcription System for the TED Corpus
نویسندگان
چکیده
This paper describes the Interactive Systems Laboratories’ automatic lecture transcription system for the Translanguage English Database (TED) corpus, which provides text-hypothesis for the International Workshop on Speech Summarization for Information Extraction and Machine Translation. Furthermore the paper gives a short analysis of speaking style characteristics, in particular addressing native vs. non-native speech. The data for automatic transcription is divided into a native and a nonnative test set. The best word error rate for the native data set is 28.5% while for the non-native set the best result shows 31.0%.
منابع مشابه
The LIMSI RT06s Lecture Transcription System
This paper describes recent research carried out in the context of the FP6 Integrated Project CHIL in developing a system to automatically transcribe lectures and presentations. Widely available corpora were used to train both the acoustic and language models, since only a small amount of CHIL data was available for system development. Acoustic model training made use of the transcribed portion...
متن کاملThe LIMSI RT07 Lecture Transcription System
A system to automatically transcribe lectures and presentations has been developed in the context of the FP6 Integrated Project CHIL. In addition to the seminar data recorded by the CHIL partners, widely available corpora were used to train both the acoustic and language models. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as th...
متن کاملLanguage modeling and transcription of the TED corpus lectures
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and ...
متن کاملTranscribing lectures and seminars
This paper describes recent research carried out in the context of the FP6 Integrated Project CHIL in developing a system to automatically transcribe lectures and seminars. We made use of widely available corpora to train both the acoustic and language models, since only a small amount of CHIL data were available for system development. For acoustic model training made use of the transcribed po...
متن کاملAdvances in lecture recognition: the ISL RT-06s evaluation system
This paper describes the 2006 lecture recognition system developed at the Interactive Systems Laboratories (ISL), for individual head-microphone (IHM), single distant microphone (SDM), and multiple distant microphones (MDM) conditions. It was evaluated in RT-06S rich transcription meeting evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the pri...
متن کامل